16 March, 2016
Nonspatial data has no location information
nonspatial = data.frame( id=c(1,2,3,4), data=rnorm(4) ) print(nonspatial)
## id data ## 1 1 -0.4081061 ## 2 2 -0.8422845 ## 3 3 -0.2370283 ## 4 4 1.6211359
Spatial data has location information
The simplest spatial data are points on a map
spatial = data.frame( id=c(1,2,3,4), data=rnorm(4), x=runif(4,-180,180), y=runif(4,-90,90) ) print(spatial)
## id data x y ## 1 1 -0.2084150 -172.61333 -71.594921 ## 2 2 -0.5754466 -29.24202 4.005584 ## 3 3 -0.1646887 -44.08515 89.482876 ## 4 4 -1.4658640 -20.39301 -50.607725
Which we can convert to explicitly spatial data using the sp package. Most GIS packages in R store data as sp classes.
library(sp)
The sp package has a method called coordinates that converts points to an sp class.
coordinates(spatial) = ~ x + y class(spatial)
## [1] "SpatialPointsDataFrame" ## attr(,"package") ## [1] "sp"
plot(spatial, axes=T)
Spatial data also needs a reference system or "projection" that allows us to represent spatial features on a map. Projections can be thought of as simply a coordinate system with an origin that is relative to a known point in space.
This is a whole field of mathematically intensive study termed "geodesy"
Much of the field of geodesy is jam-packed in the rgdal and sp packages, which use the PROJ.4 library for projections and transformations.
## will automatically load sp for us too library(rgdal)
Wonderful document on CRS in R here
rgdal includes a comprehensive list of projections that are typically represented as a string of parameters according to the PROJ.4 specifications. The simplest is a latitude/longitude system denoted as
"+proj=longlat"
To define the projection for spatial, we write to its proj4string slot:
proj4string(spatial) = "+proj=longlat"
Projections are a necessary evil for GIS users (to be continued)
With a projection associated with our spatial data, we can now relate it to other spatial data. In other words, let's make a map!
library(leaflet)
m = leaflet(data=spatial) %>% addTiles() %>% addMarkers() m
Vector = Polygons
Raster = Grid
Vector = Discrete
Raster = Continuous
Vector = Illustrator/Inkscape
Raster = Photoshop/GIMP
Whether, points, lines or polygons, comprised of coordinate pairs
Think like a table, each row (observation) is referred to as a feature.
The other data is often referred to as the attribute table or as attribute data.
Each row is one observation (one feature), this becomes important later, as we'll need to refer to individual features or to the entire geometry.
Data Input/Output
library(rgdal)
soils = readOGR(
dsn="data",
layer="soilsData")
## OGR data source with driver: ESRI Shapefile ## Source: "data", layer: "soilsData" ## with 71 features ## It has 27 fields
writeOGR(
soils,
"data",
"soilsData_out",
driver="ESRI Shapefile"
)
Other ways of creating of spatial data from list of coordinates:
wells = read.delim("./data/WellLocations.tsv")
class(wells); head(wells)
## [1] "data.frame"
## x y pts.data.id ## 1 -90.05145 43.10047 1 ## 2 -90.05553 43.10470 2 ## 3 -90.07305 43.09013 3 ## 4 -90.04716 43.08454 4 ## 5 -90.07198 43.08850 5 ## 6 -90.06599 43.09197 6
coordinates(wells) <- ~ x + y class(wells)
## [1] "SpatialPointsDataFrame" ## attr(,"package") ## [1] "sp"
Helper functions:
class(soils)
## [1] "SpatialPolygonsDataFrame" ## attr(,"package") ## [1] "sp"
slotNames(soils)
## [1] "data" "polygons" "plotOrder" "bbox" "proj4string"
length(soils)
## [1] 71
Accessing attribute data
str(soils@data[,1:10])
## 'data.frame': 71 obs. of 10 variables: ## $ mukey : Factor w/ 25 levels "2774742","2774772",..: 12 19 12 6 9 5 7 24 25 13 ... ## $ muarcrs: Factor w/ 71 levels "0.40538405","0.90105194",..: 29 59 67 17 8 25 18 5 14 22 ... ## $ Sand1 : num 21.5 11 21.5 12 29.5 ... ## $ Sand2 : num 35.29 8.34 35.29 9.02 38.22 ... ## $ Sand3 : num 45.09 7.56 45.09 16.59 64.52 ... ## $ Sand4 : num 48.6 30.6 48.6 36.7 33.2 ... ## $ Sand5 : num 0 31.1 0 27.1 57.2 ... ## $ Silt1 : num 45.8 65.2 45.8 68.7 54.5 ... ## $ Silt2 : num 37.3 64.9 37.3 60.3 43.5 ... ## $ Silt3 : num 36.7 64.1 36.7 34 21.1 ...
The geometry stored in the polygons slot
str(soils@polygons[1])
## List of 1 ## $ :Formal class 'Polygons' [package "sp"] with 5 slots ## .. ..@ Polygons :List of 1 ## .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots ## .. .. .. .. ..@ labpt : num [1:2] 514199 291168 ## .. .. .. .. ..@ area : num 10776 ## .. .. .. .. ..@ hole : logi FALSE ## .. .. .. .. ..@ ringDir: int 1 ## .. .. .. .. ..@ coords : num [1:21, 1:2] 514211 514206 514195 514178 514180 ... ## .. ..@ plotOrder: int 1 ## .. ..@ labpt : num [1:2] 514199 291168 ## .. ..@ ID : chr "0" ## .. ..@ area : num 10776
A number of common functions have methods for spatial data
silty = subset(soils, Silt1 > 70)
paste("There are", length(soils), "soil features total;")
## [1] "There are 71 soil features total;"
paste(length(silty), "with a silt percentage over 70")
## [1] "10 with a silt percentage over 70"
Making simple maps is quite easy
par(mfrow=c(1,2), bg=NA)
plot(
soils,
main="Soils Polygons",
col=rainbow(5))
plot(
wells,
main="Well Data",
col='red'
)
A coordinate reference system (CRS) defines the surface of the world. If they don't match, errors and issues can arise
To illustrate issues, where are the wells in relation to the soil?
soils = readOGR(
dsn="data",
layer="soilsData")
plot(
soils,
main="Soils"
)
plot(
wells,
add=T
)
print(wells@proj4string)
## CRS arguments: NA
print(soils@proj4string)
## CRS arguments: ## +proj=tmerc +lat_0=0 +lon_0=-90 +k=0.9996 +x_0=520000 ## +y_0=-4480000 +datum=NAD83 +units=m +no_defs +ellps=GRS80 ## +towgs84=0,0,0
coordinates(soils)[c(1,72)]
## [1] 514198.6 291168.3
coordinates(wells)[c(1,16)]
## [1] -90.05145 43.10047
Solution: define the CRS then project the points to CRS of the soils data
wells@proj4string = CRS(
"+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
wells = spTransform(
wells,
soils@proj4string)
plot(soils)
plot(
wells,
add=T,
cex=2,
col='red',
pch=19
)
How do two shapes relate? Sometimes we want to have a yes/no true/false answer to how two geometries relate. Spatial Predicates is the term for this
All contained in the rgeos package. Here we see gIntersects used on these two polygons. gEquals, gTouches, gCrosses, gContains and others.
gIntersects(soils, wells)
## [1] TRUE
head(gIntersects(wells, soils, byid=T, returnDense=T))
## 1 2 3 4 5 6 7 8 9 10 11 12 ## 0 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## 1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## 2 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## 3 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## 5 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## 13 14 15 ## 0 FALSE FALSE FALSE ## 1 FALSE FALSE FALSE ## 2 FALSE FALSE FALSE ## 3 FALSE FALSE FALSE ## 4 FALSE FALSE FALSE ## 5 FALSE FALSE FALSE
gIntersects(wells, soils, byid=T, returnDense=F)[1:2]
## $`1` ## [1] 66 ## ## $`2` ## [1] 36
gIntersects(wells, soils, byid=T, returnDense=F)[1:4]
## $`1` ## [1] 66 ## ## $`2` ## [1] 36 ## ## $`3` ## [1] 22 ## ## $`4` ## [1] 29
indxs <- gIntersects(
wells,
soils,
byid=T,
returnDense=F) %>%
unlist %>%
unique
soils_with_wells = soils[indxs,]
plot(
soils_with_wells
)
plot(
wells,
add=T,
cex=2,
pch=19,
col='red'
)
Extract soils data to our point data…
soilsDat = over(wells, soils) class(soilsDat)
## [1] "data.frame"
soilsDat$id = 1:15 wells = merge(wells, soilsDat, by.x="pts.data.id", by.y="id") head(wells@data)
## pts.data.id mukey muarcrs Sand1 Sand2 Sand3 Sand4 ## 1 1 2806652 405.22965906 35.87900 32.747667 69.83400 73.27298 ## 2 2 2806642 127.65139889 14.10700 14.001433 13.06847 14.28767 ## 3 3 2774772 23.43535446 10.51290 9.529000 10.50538 24.74279 ## 4 4 2774821 9.99030648 13.44400 11.495200 9.31480 30.69747 ## 5 5 2774777 4.35597440 12.03429 9.015357 16.59286 36.73413 ## 6 6 2774798 8.69344646 21.47179 35.293641 45.09015 48.59615 ## Sand5 Silt1 Silt2 Silt3 Silt4 Silt5 Clay1 Clay2 ## 1 73.07354 49.00485 40.71233 15.42715 13.57983 14.06340 15.11615 26.54000 ## 2 15.86000 71.64800 71.62823 70.58203 68.64867 69.06100 14.24500 14.37033 ## 3 30.45305 70.65485 65.64850 65.46994 35.22879 54.23537 18.83225 24.82250 ## 4 48.58000 68.94600 67.97213 66.78053 47.81453 35.96000 17.61000 20.53267 ## 5 27.11190 68.71821 60.28321 34.02286 39.86052 41.12143 19.24750 30.70143 ## 6 0.00000 45.78205 37.27431 36.66754 28.13462 0.00000 32.74615 27.43205 ## Clay3 Clay4 Clay5 OM1 OM2 OM3 OM4 ## 1 14.73885 13.14719 12.86306 1.7208333 0.3348077 0.2575000 0.2519231 ## 2 16.34950 17.06367 15.07900 2.1704167 1.9599167 2.7409167 1.8906667 ## 3 24.02468 40.02842 15.31158 0.9820000 0.2550000 0.2550395 0.2535526 ## 4 23.90467 21.48800 15.46000 2.7300000 1.6890000 0.5740000 0.4036667 ## 5 49.38429 23.40536 31.76667 1.2584821 0.2544643 0.2500000 0.2500000 ## 6 18.24231 13.01282 0.00000 0.8837179 0.2500000 0.2500000 0.2243590 ## OM5 Depth1 Depth2 Depth3 Depth4 Depth5 ## 1 0.2500 39 40 40 40 40 ## 2 0.2575 30 31 30 31 31 ## 3 0.2500 40 40 41 40 41 ## 4 0.2600 30 31 31 31 31 ## 5 0.2500 28 29 28 29 29 ## 6 0.0000 39 40 40 40 40
Perhaps we want to expand our our of concern
## buffer of 50 meters (because of our projection) around each well wells_area = gBuffer(wells, byid = T, width = 50) class(wells_area)
## [1] "SpatialPolygonsDataFrame" ## attr(,"package") ## [1] "sp"
soils@data = subset(soils@data, select = c("Sand1", "Silt1"))
head(over(wells_area, soils, fn = mean))
## Sand1 Silt1 ## 1 24.84412 58.12692 ## 2 12.96008 68.01767 ## 3 11.08520 69.97385 ## 4 13.62663 68.09750 ## 5 12.03429 68.71821 ## 6 16.56465 57.53745
We use the gIntersection function, this returns that area to us xyinter = gIntersection(x, y). Other functions like gDifference, gUnion, or gSymdifference also return geometry.
Grab data within a certain range of the middle? Note how this returns geometry
cent = gCentroid(soils) bff = gBuffer(cent, width = 250) int = gIntersection(soils, bff, byid = T) plot(int, col = rainbow(5))
A raster grid is rectangular.
Grid is another word for matrix.
Grid is another word for image.
A GIS raster grid is a matrix/image with an associated location and projection.
At a minimum, a GIS raster grid contains:
The rgdal rgdal packages is primarily for I/O and projecting GIS data
library(rgdal)
The raster package does everything rgdal does, but it includes lots of additional functionality.
library(raster)
elev = readGDAL("data/dem_wi.tif")
writeGDAL(elev, "data/dem_wi_out.tif")
elev = raster("data/dem_wi.tif")
writeRaster(elev, "data/dem_wi_out.tif")
The raster object elev has all the necessary pieces of spatial information:
elev
## class : RasterLayer ## dimensions : 284, 387, 109908 (nrow, ncol, ncell) ## resolution : 0.01666667, 0.01666667 (x, y) ## extent : -93.03262, -86.58262, 42.3949, 47.12823 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 ## data source : /home/devans/Documents/GeRgraphyPresentation/data/dem_wi.tif ## names : dem_wi ## values : 175, 565.4104 (min, max)
Which means we can make a map!
m = leaflet() %>% addTiles() %>% addRasterImage(elev, opacity=0.5) m
Remember that rasters are just matrices!
Therefore, most matrix operations can be applied to rasters. For example:
plot(
elev > 400,
col=c("red", "blue")
)
Rasters can be easily converted to matrices to do more complex work.
lat_grad = apply( as.matrix(elev), 1, mean, na.rm=T ) plot(lat_grad, type="l")
Most raster analysis ultimately executes some sort of overlay.
The issue:
To overlay two or more rasters, their projections, extents, and cellsizes must align perfectly.
This can be a difficult task.
What is the highest point in each county?
# Pseudo-code 1. Read in elevation data (raster grid) 2. Read in county boundary data (polygons) 3. Convert counties to raster grid that aligns with elevation grid 4. Find maximum elevation gridcell within each county
counties = readOGR("data", "WI_Counties")
## OGR data source with driver: ESRI Shapefile ## Source: "data", layer: "WI_Counties" ## with 72 features ## It has 7 fields
elev
## class : RasterLayer ## dimensions : 284, 387, 109908 (nrow, ncol, ncell) ## resolution : 0.01666667, 0.01666667 (x, y) ## extent : -93.03262, -86.58262, 42.3949, 47.12823 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 ## data source : /home/devans/Documents/GeRgraphyPresentation/data/dem_wi.tif ## names : dem_wi ## values : 175, 565.4104 (min, max)
proj4string(counties)
## [1] "+proj=tmerc +lat_0=0 +lon_0=-90 +k=0.9996 +x_0=520000 +y_0=-4480000 +ellps=GRS80 +units=m +no_defs"
proj4string(elev)
## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
extent(counties)
## class : Extent ## xmin : 294839 ## xmax : 770036.4 ## ymin : 225108.8 ## ymax : 734398.4
extent(elev)
## class : Extent ## xmin : -93.03262 ## xmax : -86.58262 ## ymin : 42.3949 ## ymax : 47.12823
cty_grid = rasterize(counties, elev, field="COUNTY_FIP") summary(cty_grid)
## layer ## Min. NA ## 1st Qu. NA ## Median NA ## 3rd Qu. NA ## Max. NA ## NA's 109908
prj = proj4string(elev) cty_prj = spTransform(counties, prj)
extent(cty_prj)
## class : Extent ## xmin : -92.88924 ## xmax : -86.8048 ## ymin : 42.49197 ## ymax : 47.08077
extent(elev)
## class : Extent ## xmin : -93.03262 ## xmax : -86.58262 ## ymin : 42.3949 ## ymax : 47.12823
plot(elev) plot(cty_prj, add=TRUE)
cty_grid = rasterize(counties, elev, field="COUNTY_FIP") summary(cty_grid)
## layer ## Min. NA ## 1st Qu. NA ## Median NA ## 3rd Qu. NA ## Max. NA ## NA's 109908
extent(cty_grid)
## class : Extent ## xmin : -93.03262 ## xmax : -86.58262 ## ymin : 42.3949 ## ymax : 47.12823
extent(elev)
## class : Extent ## xmin : -93.03262 ## xmax : -86.58262 ## ymin : 42.3949 ## ymax : 47.12823
library(dplyr)
ovly = data.frame(
elev=getValues(elev),
cty=getValues(cty_grid)
)
hi_pt = ovly %>%
group_by(cty) %>%
mutate(
elev = (elev == max(elev, na.rm=T)) * elev
) %>%
ungroup()
elev = setValues(elev, hi_pt[["elev"]])
elev[elev == 0] = NA
hi_pt_sp = rasterToPoints(elev, spatial=T)
Scenario tasked with:
Create map of percent Democratic votes by ward
Also display percent turnout
wards = readOGR("data", "WardData")
## OGR data source with driver: ESRI Shapefile ## Source: "data", layer: "WardData" ## with 280 features ## It has 67 fields
options(stringsAsFactors=F)
source("./misc_scripts/function_proper_legend.r")
library(rgdal)
library(rgeos)
library(foreign)
library(classInt)
library(RColorBrewer)
library(scales)
wards@data$SEN_PERC_DEM = with(wards@data, SEN_DEM/SEN_TOT)
wards@data$SEN_PERC_TURN = with(wards@data, SEN_TOT/PERSONS1)
wards_centroids = gCentroid(wards, byid=T)
wards_centroids = SpatialPointsDataFrame(
gCentroid(wards, byid=T),
over(wards_centroids, wards)
)
## defining number of classes
num_classes = 6
## the color palette
pal = brewer.pal(num_classes, "RdBu")
## the class intervals to use for the colors
class_ints = classIntervals(
wards@data$SEN_PERC_DEM,
num_classes,
style="quantile")
## grab the colors for plotting
colrs = findColours(class_ints, pal)
## Custom legend formatting
source("./misc_scripts/function_proper_legend.r")
legtxt = properLegend(colrs, 2)
plot(wards,
col=colrs,
main="Senate 24",
border=NA)
plot(wards_centroids,
pch=20,
cex=(wards_centroids@data$SEN_PERC_TURN),
col=alpha('black', 0.5),
add=T)
legend("topleft",
legtxt,
title="Proportion Democrat",
fill=pal,
bty='n'
)
legend("topright",
c("20%", '50%', '80%'),
pch=20,
pt.cex=c(0.2, 0.5, 0.8),
col=alpha('black', 0.5),
bty='n',
title="Percent\nTurnout"
)
library(spdep)
## Loading required package: Matrix
wards@data$SEN_PERC_DEM = with(wards@data, SEN_DEM/SEN_TOT) wards@data$SEN_PERC_TURN = with(wards@data, SEN_TOT/PERSONS1) wards@data$CON_PERC_DEM = with(wards@data, CON_DEM/CON_TOT) wards@data$PRES_PERC_DEM = with(wards@data, PRES_DE/PRES_TO) ### Remove NaN's created from x/0 wards = subset(wards, !is.nan(wards@data$SEN_PERC_DEM)) ### Create the neighborhood structure, ### which defines the error structure ### for the spatial regression neighborhood_binary = poly2nb(wards) list_of_weights = nb2listw(neighborhood_binary, zero.policy=T) ### Run Moran's i to see if there is a spatial component to % dem moran.test(wards@data$SEN_PERC_DEM, list_of_weights, alternative="two.sided")
## ## Moran I test under randomisation ## ## data: wards@data$SEN_PERC_DEM ## weights: list_of_weights ## ## Moran I statistic standard deviate = 15.295, p-value < 2.2e-16 ## alternative hypothesis: two.sided ## sample estimates: ## Moran I statistic Expectation Variance ## 0.567975167 -0.003623188 0.001396579
### Run spatial regression
spat_lin_reg = spautolm(
SEN_PERC_DEM ~ WHITE + BLACK + PRES_PERC_DEM + CON_PERC_DEM,
data=wards,
family="SAR",
listw=list_of_weights)